AITopics | human transcription

Collaborating Authors

human transcription

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

What Do Humans Hear When Interacting? Experiments on Selective Listening for Evaluating ASR of Spoken Dialogue Systems

Mori, Kiyotada, Kawano, Seiya, Liu, Chaoran, Ishi, Carlos Toshinori, Contreras, Angel Fernando Garcia, Yoshino, Koichiro

arXiv.org Artificial IntelligenceOct-9-2025

Spoken dialogue systems (SDSs) utilize automatic speech recognition (ASR) at the front end of their pipeline. The role of ASR in SDSs is to recognize information in user speech related to response generation appropriately. Examining selective listening of humans, which refers to the ability to focus on and listen to important parts of a conversation during the speech, will enable us to identify the ASR capabilities required for SDSs and evaluate them. In this study, we experimentally confirmed selective listening when humans generate dialogue responses by comparing human transcriptions for generating dialogue responses and reference transcriptions. Based on our experimental results, we discuss the possibility of a new ASR evaluation method that leverages human selective listening, which can identify the gap between transcription ability between ASR systems and humans.

artificial intelligence, dialogue response, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.04402

Country: Asia > Japan (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Style-agnostic evaluation of ASR using multiple reference transcripts

McNamara, Quinten, Fernández, Miguel Ángel del Río, Bhandari, Nishchal, Ratajczak, Martin, Chen, Danny, Miller, Corey, Jetté, Migüel

arXiv.org Artificial IntelligenceDec-10-2024

Word error rate (WER) as a metric has a variety of limitations that have plagued the field of speech recognition. Evaluation datasets suffer from varying style, formality, and inherent ambiguity of the transcription task. In this work, we attempt to mitigate some of these differences by performing style-agnostic evaluation of ASR systems using multiple references transcribed under opposing style parameters. As a result, we find that existing WER reports are likely significantly over-estimating the number of contentful errors made by state-of-the-art ASR systems. In addition, we have found our multireference method to be a useful mechanism for comparing the quality of ASR models that differ in the stylistic makeup of their training data and target task.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.07937

Country:

North America > United States > Pennsylvania (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Germany > Saxony > Leipzig (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.90)
Information Technology > Artificial Intelligence > Machine Learning (0.72)

Add feedback

Developing an End-to-End Framework for Predicting the Social Communication Severity Scores of Children with Autism Spectrum Disorder

Mun, Jihyun, Kim, Sunhee, Chung, Minhwa

arXiv.org Artificial IntelligenceAug-30-2024

Autism Spectrum Disorder (ASD) is a lifelong condition that significantly influencing an individual's communication abilities and their social interactions. Early diagnosis and intervention are critical due to the profound impact of ASD's characteristic behaviors on foundational developmental stages. However, limitations of standardized diagnostic tools necessitate the development of objective and precise diagnostic methodologies. This paper proposes an end-to-end framework for automatically predicting the social communication severity of children with ASD from raw speech data. This framework incorporates an automatic speech recognition model, fine-tuned with speech data from children with ASD, followed by the application of fine-tuned pre-trained language models to generate a final prediction score. Achieving a Pearson Correlation Coefficient of 0.6566 with human-rated scores, the proposed method showcases its potential as an accessible and objective tool for the assessment of ASD.

severity, spectrum disorder, transcription, (11 more...)

arXiv.org Artificial Intelligence

2409.00158

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.65)

Industry: Health & Medicine > Therapeutic Area > Neurology > Autism (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.90)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

HTEC: Human Transcription Error Correction

Sun, Hanbo, Gao, Jian, Wu, Xiaomin, Fang, Anjie, Cao, Cheng, Du, Zheng

arXiv.org Artificial IntelligenceSep-18-2023

High-quality human transcription is essential for training and improving Automatic Speech Recognition (ASR) models. Recent study~\cite{libricrowd} has found that every 1% worse transcription Word Error Rate (WER) increases approximately 2% ASR WER by using the transcriptions to train ASR models. Transcription errors are inevitable for even highly-trained annotators. However, few studies have explored human transcription correction. Error correction methods for other problems, such as ASR error correction and grammatical error correction, do not perform sufficiently for this problem. Therefore, we propose HTEC for Human Transcription Error Correction. HTEC consists of two stages: Trans-Checker, an error detection model that predicts and masks erroneous words, and Trans-Filler, a sequence-to-sequence generative model that fills masked positions. We propose a holistic list of correction operations, including four novel operations handling deletion errors. We further propose a variant of embeddings that incorporates phoneme information into the input of the transformer. HTEC outperforms other methods by a large margin and surpasses human annotators by 2.2% to 4.5% in WER. Finally, we deployed HTEC to assist human annotators and showed HTEC is particularly effective as a co-pilot, which improves transcription quality by 15.1% without sacrificing transcription velocity.

annotator, trans-filler, transcription, (16 more...)

arXiv.org Artificial Intelligence

2309.10089

Country:

Oceania > Australia (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Quality > Data Cleaning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection

Behre, Piyush, Tan, Sharman, Shah, Amy, Kesavamoorthy, Harini, Chang, Shuangyu, Zuo, Fei, Basoglu, Chris, Pathak, Sayan

arXiv.org Artificial IntelligenceOct-26-2022

Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well. Human evaluation is expensive, time-consuming, and suffers from large inter-observer variability, especially in conversational speech devoid of strict grammatical structures. Large pre-trained models capture a notion of grammatical structure. We present TRScore, a novel readability measure using the GPT model to evaluate different segmentation and punctuation systems. We validate our approach with human experts. Additionally, our approach enables quantitative assessment of text post-processing techniques such as capitalization, inverse text normalization (ITN), and disfluency on overall readability, which traditional word error rate (WER) and slot error rate (SER) metrics fail to capture. TRScore is strongly correlated to traditional F1 and human readability scores, with Pearson's correlation coefficients of 0.67 and 0.98, respectively. It also eliminates the need for human transcriptions for model selection.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.15104

Country:

Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Who Decides if AI is Fair? The Labels Problem in Algorithmic Auditing

Mishra, Abhilash, Gorana, Yash

arXiv.org Artificial IntelligenceNov-16-2021

Labelled "ground truth" datasets are routinely used to evaluate and audit AI algorithms applied in high-stakes settings. However, there do not exist widely accepted benchmarks for the quality of labels in these datasets. We provide empirical evidence that quality of labels can significantly distort the results of algorithmic audits in real-world settings. Using data annotators typically hired by AI firms in India, we show that fidelity of the ground truth data can lead to spurious differences in performance of ASRs between urban and rural populations. After a rigorous, albeit expensive, label cleaning process, these disparities between groups disappear. Our findings highlight how trade-offs between label quality and data annotation costs can complicate algorithmic audits in practice. They also emphasize the need for development of consensus-driven, widely accepted benchmarks for label quality.

algorithmic audit, audit, label quality, (16 more...)

arXiv.org Artificial Intelligence

2111.08723

Country:

Asia > India (0.29)
North America > United States > Illinois > Cook County > Chicago (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.98)

Add feedback

Facebook's latest AI can learn speech without human transcriptions

EngadgetMay-21-2021, 13:00:05 GMT

Speech recognition is an important cog in Big Tech's AI machinery. But, despite its ubiquity, speech recognition is still a work in progress. Today, Facebook is heralding a major breakthrough in the way it trains these systems to learn new languages. The company says it has developed a method of building speech recognition tools that don't require transcribed data. The time consuming task involves humans listening to and transcribing hours of audio, a monotonous process that has to be repeated for each language. Whereas Facebook's "unsupervised" system learns purely from speech audio and unpaired text to give it a better sense of what human communication sounds like.

facebook, human transcription, learn speech, (7 more...)

Engadget

Country:

North America (0.06)
Europe (0.06)
Asia > Kyrgyzstan (0.06)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Comparing Google's AI Speech Recognition To Human Captioning For Television News

#artificialintelligenceJun-9-2019, 05:32:49 GMT

Most television stations still rely on human transcription to generate the closed captioning for their live broadcasts. Yet even with the benefit of human fluency, this captioning can vary wildly in quality, even within the same broadcast, from a nearly flawless rendition to near-gibberish. At the same time, automatic speech recognition has historically struggled to achieve sufficient accuracy to entirely replace human transcription. Using a week of television news from the Internet Archive's Television News Archive, how does the station-provided primarily human-created closed captioning compare with machine-generated transcripts generated by Google's Cloud Speech-to-Text API? Automated high-quality captioning of live video represents one of the holy grails of machine speech recognition. While machine captioning systems have improved dramatically over the years, there has still been a substantial gap holding them back from fully matching human accuracy.

artificial intelligence, speech recognition, transcript, (15 more...)

#artificialintelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.04)

Industry:

Media > Television (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback